Understanding Document Aboutness Step Two: Identifying Interesting Things
نویسندگان
چکیده
We define the notion of an interesting nugget in a document. Such nuggets attract a user's attention and lead them to explore more information around that nugget. In order to measure and model interestingness, we look at browsing sessions within Wikipedia and we build a data set of transitions (clickthrough) from a source Wikipedia page to a destination Wikipedia page through anchor clicks. We investigate factors that influence the probability of a click on an anchor in a Wikipedia page. We propose a topic modeling approach which jointly models the contents of the source and destination pages. We then use the estimated posterior on latent variables as features, along with page structure and user metadata features to build a model of interestingness. Finally, we evaluate this model using different feature sets and we demonstrate the model's effectiveness at predicting interesting nuggets. Experimental results show that the latent semantic features are effective in predicting interestingness and can outperform baseline features.
منابع مشابه
Understanding Document Aboutness - Step One: Identifying Salient Entities
We propose a system that determines the salience of entities within web documents. Many recent advances in commercial search engines leverage the identification of entities in web pages. However, for many pages, only a small subset of entities are important, or central, to the document, which can lead to degraded relevance for entity triggered experiences. We address this problem by devising a ...
متن کاملPreliminary Analyses of Information Features Provided by Users for Identifying Music
This paper presents preliminary findings based on the analyses of user-provided information features found in 566 queries seeking help in the identification of particular music works or artists. Queries were drawn from the answers.google.com (Google Answers) website. The types and frequency of occurrences of different information features are compared with the results from previous studies of m...
متن کاملAn Implementation of Symbolic Aboutness Theory
Today information can be globally shared via the Internet and can be accessible from anywhere in the world. The increasing complexity and size of the WWW urges the need of more effective mode for information processing techniques such as information retrieval and filtering, information summarization, topic segmentation, data mining and information discovery, etc. All of them can be fundamentall...
متن کاملAboutness from a commonsense perspective
Information retrieval (IR) is driven by a process which decides whether a document is about a query. Recent attempts spawned from logic-based information retrieval theory have formalized properties characterizing “aboutness”, but no consensus has yet been reached. The proposed properties are largely determined by the underlying framework within which aboutness is defined. In addition, some prop...
متن کاملA commonsense aboutness theory for information retrieval modeling
Information retrieval (IR) can be viewed as a process to determine the “aboutness”, or sometimes “relevance”, relationship between information carriers (e.g. document and query). Thus, the concept of aboutness lies at the heart of IR. A better understanding of aboutness would lead to more effective IR systems. In this paper, we give a review of the status of current research on aboutness. It is...
متن کامل